AITopics | student network

Collaborating Authors

student network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

466473650870501e3600d9a1b4ee5d44-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 16:36:20 GMT

artificial intelligence, machine learning, perturbation, (17 more...)

Neural Information Processing Systems

Country: Asia > South Korea (0.28)

Industry: Education (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Discovering and Overcoming Limitations of Noise-engineered Data-free Knowledge Distillation

Neural Information Processing SystemsApr-25-2026, 00:49:11 GMT

Distillation in neural networks using only the samples randomly drawn from a Gaussian distribution is possibly the most straightforward solution one can think of for the complex problem of knowledge transfer from one network (teacher) to the other (student). If successfully done, it can eliminate the requirement of teacher's training data for knowledge distillation and avoid often arising privacy concerns in sensitive applications such as healthcare. There have been some recent attempts at Gaussian noise-based data-free knowledge distillation, however, none of them offer a consistent or reliable solution. We identify the shift in the distribution of hidden layer activation as the key limiting factor, which occurs when Gaussian noise is fed to the teacher network instead of the accustomed training data. We propose a simple solution to mitigate this shift and show that for vision tasks, such as classification, it is possible to achieve a performance close to the teacher by just using the samples randomly drawn from a Gaussian distribution.

artificial intelligence, gaussian noise, machine learning, (14 more...)

Neural Information Processing Systems

Industry:

Education (0.94)
Information Technology > Security & Privacy (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

1d6408264d31d453d556c60fe7d0459e-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:10:47 GMT

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.68)

Industry:

Education (0.93)
Government > Regional Government (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks

Bocchi, Dario, Regimbeau, Theotime, Lucibello, Carlo, Saglietti, Luca, Cammarota, Chiara

arXiv.org Machine LearningApr-6-2026

We analyze the one-pass stochastic gradient descent dynamics of a two-layer neural network with quadratic activations in a teacher--student framework. In the high-dimensional regime, where the input dimension $N$ and the number of samples $M$ diverge at fixed ratio $α= M/N$, and for finite hidden widths $(p,p^*)$ of the student and teacher, respectively, we study the low-dimensional ordinary differential equations that govern the evolution of the student--teacher and student--student overlap matrices. We show that overparameterization ($p>p^*$) only modestly accelerates escape from a plateau of poor generalization by modifying the prefactor of the exponential decay of the loss. We then examine how unconstrained weight norms introduce a continuous rotational symmetry that results in a nontrivial manifold of zero-loss solutions for $p>1$. From this manifold the dynamics consistently selects the closest solution to the random initialization, as enforced by a conserved quantity in the ODEs governing the evolution of the overlaps. Finally, a Hessian analysis of the population-loss landscape confirms that the plateau and the solution manifold correspond to saddles with at least one negative eigenvalue and to marginal minima in the population-loss geometry, respectively.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

2604.03068

Country:

Europe > Italy > Lombardy > Milan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Paraphrasing Complex Network: Network Compression via Factor Transfer

Neural Information Processing SystemsMar-16-2026, 21:56:10 GMT

Many researchers have sought ways of model compression to reduce the size of a deep neural network (DNN) with minimal performance degradation in order to use DNNs in embedded systems. Among the model compression methods, a method called knowledge transfer is to train a student network with a stronger teacher network. In this paper, we propose a novel knowledge transfer method which uses convolutional operations to paraphrase teacher's knowledge and to translate it for the student. This is done by two convolutional modules, which are called a paraphraser and a translator. The paraphraser is trained in an unsupervised manner to extract the teacher factors which are defined as paraphrased information of the teacher network. The translator located at the student network extracts the student factors and helps to translate the teacher factors by mimicking them. We observed that our student network trained with the proposed factor transfer method outperforms the ones trained with conventional knowledge transfer methods.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

4ec0b6648bdf487a2f1c815924339022-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 03:15:02 GMT

In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effectofthefeatureprojector between thestudent andtheteacher remains underexplored.

artificial intelligence, machine learning, projector, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DiversityMattersWhenLearningFromEnsembles

Neural Information Processing SystemsFeb-19-2026, 01:29:06 GMT

Whilesomerecent works propose to distill an ensemble model into a single model to reduce such costs,thereisstillaperformance gapbetween theensemble anddistilledmodels.

artificial intelligence, machine learning, perturbation, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

How a student becomes a teacher: learning and forgetting through Spectral methods

Neural Information Processing SystemsFeb-16-2026, 20:30:25 GMT

The above scheme proves particularly relevant when the student network is overparameterized (namely, when larger layer sizes are employed) as compared to the underlying teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network.

artificial intelligence, machine learning, matrix, (20 more...)

Neural Information Processing Systems

Country: